[SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery#38336
Closed
allisonwang-db wants to merge 2 commits intoapache:masterfrom
Closed
[SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery#38336allisonwang-db wants to merge 2 commits intoapache:masterfrom
allisonwang-db wants to merge 2 commits intoapache:masterfrom
Conversation
815d3e1 to
0120a70
Compare
Contributor
Author
|
cc @cloud-fan |
cloud-fan
reviewed
Oct 25, 2022
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
Outdated
Show resolved
Hide resolved
cloud-fan
approved these changes
Oct 25, 2022
jchen5
approved these changes
Oct 26, 2022
Contributor
jchen5
left a comment
There was a problem hiding this comment.
Looks good to me, one small comment
Contributor
There was a problem hiding this comment.
Looks like this comment needs to be updated - in the new case it's returning None, rather than the inner query block below HAVING (and this is ok because we only needed the aggregate to fix the COUNT bug). Right?
Contributor
There was a problem hiding this comment.
Yea, we should at least explain when the third part can be None.
cloud-fan
reviewed
Oct 27, 2022
Contributor
There was a problem hiding this comment.
how about we make the case 1 result a lazy val? (multiple variable lazy val looks weird)
lazy val planWithoutCountBug = Project(... // Or just val as constructing logical plan is cheap
if (resultWithZeroTups.isEmpty) {
planWithoutCountBug
} else {
val (topPart, havingNode, aggNode) = splitSubquery(query)
if (aggNode.isEmpty) planWithoutCountBug else ...
}
f24ead1 to
8ea1954
Compare
8ea1954 to
7f4cf74
Compare
cloud-fan
approved these changes
Oct 28, 2022
Contributor
|
thanks, merging to master! |
SandishKumarHN
pushed a commit
to SandishKumarHN/spark
that referenced
this pull request
Dec 12, 2022
…atedScalarSubquery ### What changes were proposed in this pull request? This PR updates the `splitSubquery` in `RewriteCorrelatedScalarSubquery` to support non-aggregated one-row subquery. In CheckAnalysis, we allow three types of correlated scalar subquery patterns: 1. SubqueryAlias/Project + Aggregate 2. SubqueryAlias/Project + Filter + Aggregate 3. SubqueryAlias/Project + LogicalPlan (maxRows <= 1) https://github.com/apache/spark/blob/748fa2792e488a6b923b32e2898d9bb6e16fb4ca/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L851-L856 We should support the thrid case in `splitSubquery` to avoid `Unexpected operator` exceptions. ### Why are the changes needed? To fix an issue with correlated subquery rewrite. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New unit tests. Closes apache#38336 from allisonwang-db/spark-40862-split-subquery. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR updates the
splitSubqueryinRewriteCorrelatedScalarSubqueryto support non-aggregated one-row subquery.In CheckAnalysis, we allow three types of correlated scalar subquery patterns:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Lines 851 to 856 in 748fa27
We should support the thrid case in
splitSubqueryto avoidUnexpected operatorexceptions.Why are the changes needed?
To fix an issue with correlated subquery rewrite.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit tests.